Search | WHO COVID-19 Research Database

TransformEHRs: a flexible methodology for building transparent ETL processes for EHR reuse.

Pedrera-Jiménez, Miguel; García-Barrio, Noelia; Rubio-Mayo, Paula; Tato-Gómez, Alberto; Cruz-Bermúdez, Juan Luis; Bernal-Sobrino, José Luis; Muñoz-Carrero, Adolfo; Serrano-Balazote, Pablo.

Methods Inf Med ; 2022 Oct 11.

Article in English | MEDLINE | ID: covidwho-2062350

ABSTRACT

BACKGROUND: During the COVID-19 pandemic, several methodologies were designed for obtaining electronic health record (EHR)-derived datasets for research. These processes are often based on black boxes, on which clinical researchers are unaware of how the data were recorded, extracted, and transformed. In order to solve this, it is essential that extract, transform, and load (ETL) processes are based on transparent, homogeneous, and formal methodologies, making them understandable, reproducible, and auditable. OBJECTIVES: This study aims to design and implement a methodology, according with FAIR Principles, for building ETL processes (focused on data extraction, selection, and transformation) for EHR reuse in a transparent and flexible manner, applicable to any clinical condition and health care organization. METHODS: The proposed methodology comprises four stages: (1) analysis of secondary use models and identification of data operations, based on internationally used clinical repositories, case report forms, and aggregated datasets; (2) modeling and formalization of data operations, through the paradigm of the Detailed Clinical Models; (3) agnostic development of data operations, selecting SQL and R as programming languages; and (4) automation of the ETL instantiation, building a formal configuration file with XML. RESULTS: First, four international projects were analyzed to identify 17 operations, necessary to obtain datasets according to the specifications of these projects from the EHR. With this, each of the data operations was formalized, using the ISO 13606 reference model, specifying the valid data types as arguments, inputs and outputs, and their cardinality. Then, an agnostic catalog of data was developed through data-oriented programming languages previously selected. Finally, an automated ETL instantiation process was built from an ETL configuration file formally defined. CONCLUSIONS: This study has provided a transparent and flexible solution to the difficulty of making the processes for obtaining EHR-derived data for secondary use understandable, auditable, and reproducible. Moreover, the abstraction carried out in this study means that any previous EHR reuse methodology can incorporate these results into them.

Obtaining EHR-derived datasets for COVID-19 research within a short time: a flexible methodology based on Detailed Clinical Models.

Pedrera-Jiménez, Miguel; García-Barrio, Noelia; Cruz-Rojo, Jaime; Terriza-Torres, Ana Isabel; López-Jiménez, Elena Ana; Calvo-Boyero, Fernando; Jiménez-Cerezo, María Jesús; Blanco-Martínez, Alvar Javier; Roig-Domínguez, Gustavo; Cruz-Bermúdez, Juan Luis; Bernal-Sobrino, José Luis; Serrano-Balazote, Pablo; Muñoz-Carrero, Adolfo.

J Biomed Inform ; 115: 103697, 2021 03.

Article in English | MEDLINE | ID: covidwho-1062445

ABSTRACT

BACKGROUND: COVID-19 ranks as the single largest health incident worldwide in decades. In such a scenario, electronic health records (EHRs) should provide a timely response to healthcare needs and to data uses that go beyond direct medical care and are known as secondary uses, which include biomedical research. However, it is usual for each data analysis initiative to define its own information model in line with its requirements. These specifications share clinical concepts, but differ in format and recording criteria, something that creates data entry redundancy in multiple electronic data capture systems (EDCs) with the consequent investment of effort and time by the organization. OBJECTIVE: This study sought to design and implement a flexible methodology based on detailed clinical models (DCM), which would enable EHRs generated in a tertiary hospital to be effectively reused without loss of meaning and within a short time. MATERIAL AND METHODS: The proposed methodology comprises four stages: (1) specification of an initial set of relevant variables for COVID-19; (2) modeling and formalization of clinical concepts using ISO 13606 standard and SNOMED CT and LOINC terminologies; (3) definition of transformation rules to generate secondary use models from standardized EHRs and development of them using R language; and (4) implementation and validation of the methodology through the generation of the International Severe Acute Respiratory and emerging Infection Consortium (ISARIC-WHO) COVID-19 case report form. This process has been implemented into a 1300-bed tertiary Hospital for a cohort of 4489 patients hospitalized from 25 February 2020 to 10 September 2020. RESULTS: An initial and expandable set of relevant concepts for COVID-19 was identified, modeled and formalized using ISO-13606 standard and SNOMED CT and LOINC terminologies. Similarly, an algorithm was designed and implemented with R and then applied to process EHRs in accordance with standardized concepts, transforming them into secondary use models. Lastly, these resources were applied to obtain a data extract conforming to the ISARIC-WHO COVID-19 case report form, without requiring manual data collection. The methodology allowed obtaining the observation domain of this model with a coverage of over 85% of patients in the majority of concepts. CONCLUSION: This study has furnished a solution to the difficulty of rapidly and efficiently obtaining EHR-derived data for secondary use in COVID-19, capable of adapting to changes in data specifications and applicable to other organizations and other health conditions. The conclusion to be drawn from this initial validation is that this DCM-based methodology allows the effective reuse of EHRs generated in a tertiary Hospital during COVID-19 pandemic, with no additional effort or time for the organization and with a greater data scope than that yielded by conventional manual data collection process in ad-hoc EDCs.

Subject(s)

COVID-19/pathology , Datasets as Topic , Electronic Health Records , Algorithms , COVID-19/epidemiology , COVID-19/virology , Cohort Studies , Humans , Logical Observation Identifiers Names and Codes , SARS-CoV-2/isolation & purification , Systematized Nomenclature of Medicine

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL